Classification of web robots: An empirical study based on over one billion requests
نویسندگان
چکیده
Many studies on detection and classification of web robots have focused their attention mostly on text crawlers, and empirical experiments used relatively small data collected at universities. In this paper, we analyzed more than one billion requests to www.microsoft. com in 24 h. Web logs were made anonymous to eliminate potential privacy concerns while preserving essential characteristics (e.g., frequency, queries, etc). We have developed an effective characterization metrics, based on workload characteristics and resource types, in detecting and classifying various web robots including text crawlers, link checkers, and icon crawlers. As expected, web robot behavior was clearly different from that of typical interactive users, and different types of web robots also exhibited different characteristics. However, comparison of the similar type of web robots, text crawlers in particular, revealed different characteristics, thereby enabling characterization with reasonably high confidence level. We divided various feature metrics into five groups, and effectiveness of each group in classification is shown in polar diagram in the decreasing order of effectiveness in the clockwise direction. One can use the findings to classify likely identify of unknown web robots, and organizations can develop appropriate measures to deal with them. Our analysis is based on recent web log data collected at one of the best known site which offers truly global service. Crown Copyright a 2009 Published by Elsevier Ltd. All rights reserved.
منابع مشابه
A density based clustering approach to distinguish between web robot and human requests to a web server
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...
متن کاملWeb Robot Detection based on Monotonous Behavior
Several studies examined various features on how to most effectively detect web robots. Based on an insight that most web robots, regardless of specifics, would exhibit focused and therefore monotonous behavior, this paper proposes that monitoring the rate of behavioral change is highly effective in detecting sessions initiated by web robots. Empirical evaluation performed on more than one bill...
متن کاملIdentification and Classification of Desirable Web-Based Services from the Perspective of Website Users of Iran’s Hospitals Based on Kano Model of Customer Satisfaction
Background and Aim: A hospital website is an appropriate system for exchanging information and connecting patients, hospitals and medical staff. The purpose of this study was to identify and classify desirable web-based services in websites of Iran's hospitals based on Kano’s Customer Satisfaction Model. Materials and Methods: This was a survey study. The statistical population of the study co...
متن کاملAnomaly-based Web Attack Detection: The Application of Deep Neural Network Seq2Seq With Attention Mechanism
Today, the use of the Internet and Internet sites has been an integrated part of the people’s lives, and most activities and important data are in the Internet websites. Thus, attempts to intrude into these websites have grown exponentially. Intrusion detection systems (IDS) of web attacks are an approach to protect users. But, these systems are suffering from such drawbacks as low accuracy in ...
متن کاملFuzzy Motion Control for Wheeled Mobile Robots in Real-Time
Due to various advantages of Wheeled Mobile Robots (WMRs), many researchers have focused to solve their challenges. The automatic motion control of such robots is an attractive problem and is one of the issues which should carefully be examined. In the current paper, the trajectory tracking problem of WMRs which are actuated by two independent electrical motors is deliberated. To this end, and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computers & Security
دوره 28 شماره
صفحات -
تاریخ انتشار 2009